Materialisation and data partitioning algorithms for distributed RDF systems
نویسندگان
چکیده
Many RDF systems support reasoning with Datalog rules via materialisation, where all conclusions of data and the are precomputed explicitly stored in a preprocessing step. As amount used applications keeps increasing, processing large datasets often requires distributing cluster shared-nothing servers. While numerous distributed query answering techniques known, materialisation is less well understood. In this paper, we present several that facilitate scalable systems. First, new algorithm aims to minimise communication synchronisation cluster. Second, two algorithms for partitioning data, both which aim produce tightly connected partitions, but without loading complete into memory. We evaluate our against state-of-the-art show technique offers competitive performance, particularly when complex. Moreover, analyse depth effects on performance offer comparable or superior state art min-cut partitioning, computing partitions considerably time
منابع مشابه
Adaptive Partitioning for Very Large RDF Data
State-of-the-art distributed RDF systems partition data across multiple computer nodes (workers). Some systems perform cheap hash partitioning, which may result in expensive query evaluation, while others apply heuristics aiming at minimizing inter-node communication during query evaluation. This requires an expensive data pre-processing phase, leading to high startup costs for very large RDF k...
متن کاملEfficient Task Partitioning Algorithms for Distributed Shared Memory Systems
In this paper, we consider the tree task graphs which arise from many important programming paradigms such as divide and conquer, branch and bound etc., and the linear task-graphs that stem from common computation schemes such as pipelining, iterative calculation etc.. The target architecture considered is a distributed shared memory architecture with indirect network or wormhole-routed direct ...
متن کاملPartitioning bin-packing algorithms for distributed real-time systems
Embedded real-time systems must satisfy not only logical functional requirements but also para-functional properties such as timeliness, Quality of Service (QoS) and reliability. We have developed a model-based tool called Time Weaver which enables the modeling of functional and para-functional behaviors of real-time systems. It also performs automated schedulability analysis, and generates glu...
متن کاملOnline Data Partitioning in Distributed Database Systems
Most of previous studies on automatic database partitioning focus on deriving a (near-)optimal (re)partition scheme according to a specific pair of database and query workload and oversees the problem about how to efficiently deploy the derived partition scheme into the underlying database system. In fact, (re)partition scheme deployment is often non-trivial and challenging, especially in a dis...
متن کاملPartitioning Templates for RDF
In this paper, we present an RDF data distribution approach which overcomes the shortcomings of the current solutions in order to scale RDF storage both with the volume of data and query requests. We apply a workload-aware method that identifies frequent patterns accessed by queries in order to keep related data in the same partition. In order to avoid exhaustive analysis on large datasets, a s...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Journal of Web Semantics
سال: 2022
ISSN: ['1570-8268', '1873-7749']
DOI: https://doi.org/10.1016/j.websem.2022.100711